NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / lang / c-part1 / 6024 < prev next >

Wrap

Internet Message Format | 1996-08-05 | 1.8 KB

Path: keats.ugrad.cs.ubc.ca!not-for-mail From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku) Newsgroups: comp.lang.c Subject: Re: unique id for a string Date: 21 Feb 1996 10:57:50 -0800 Organization: Computer Science, University of B.C., Vancouver, B.C., Canada Message-ID: <4gfpveINNe68@keats.ugrad.cs.ubc.ca> References: <1996Feb16.175601.114182@kuhub.cc.ukans.edu> <3129dd39.961626@news.iquest.net> NNTP-Posting-Host: keats.ugrad.cs.ubc.ca In article <3129dd39.961626@news.iquest.net>, Robert B. Clark <rclark@iquest.net> wrote: >On 16 Feb 96 17:56:01 CST, anh@kuhub.cc.ukans.edu wrote: > >>HELLO = H * 26^4 + E * 26 ^3 + L * 26^2 + L * 26^1 + E * 26^0 >> >>But the id is way too big. I need something that fits within a 32-bits int >>or a 64-bits long. > >You're thinking way too hard, perhaps. Look up the algorithm for CRCs >(cyclic redundancy checks/codes) in Snippets or most any other popular >programmer's reference. But he wants a _unique_ ID. This is not possible, since the space of possible words is much larger than the space of 32- or even 64-bit integers. Even for a limited dictionary of around 200,000 words, a function that _guarantees_ a unique hash for each indentifier into a 32-bit space is difficult to find. The problem is that the poster did not give enough context about the actual problem he is trying to solve (unless this is just a homework exercise to find such a mapping). Perhaps the problem can be effectively solved without having to compute such a mapping, which is probably of little worth. One way to get a unique mapping is to simply assign increasing numbers to the 200,000+ words, and use a hashing technique to quickly look up a word given its integer tag, and look up the tag given the word. With the proper represenatation, both operations can be done in "constant time". --